Background

The file ‘WeatherDownloads_202005_v002.Rmd’ contains code for dowloading and processing historical weather data as contained in METAR archives hosted by Iowa State University.

Data have been dowloaded and processed for several stations (airports) and years, with .rds files saved in “./RInputFiles/ProcessedMETAR”.

This module will perform exploratory data analysis on the processed weather files.

Data Availability

Each processed data file contains one year of hourly weather data for one station. Files are saved as ‘./RInputFiles/ProcessedMETAR/metar_kxxx_yyyy.rds’ where xxx is the three-digit airport code and yyyy is the four-digit year.

Each file contains the following variables:

  • METAR (chr) - the extracted portion of the METAR based on a regex string
  • WindDir (chr) - the previaling wind direction in degrees, stored as a character since ‘VRB’ means variable
  • WindSpeed (int) - the prevailing wind speed in knots
  • WindGust (dbl) - the wind gust speed in knots (NA if there is no recorded wind gust at that hour)
  • Dummy (chr) - artifact, always a blank space
  • Visibility (dbl) - surface visibility in statute miles
  • TempC (int) - temperature in degrees Celsius
  • DewC (int) - dew point in degrees Celsius
  • Altimeter (int) - altimeter in inches of mercury
  • SLP (int) - the raw sea-level-pressure reading from the METAR
  • FahrC (chr) - the raw temperature string pulled from the METAR (Tttttdddd) where tttt is the Fahrenheit temperature recorded in Celsius and dddd is the Fahrenheit dew point recorded in Celsius
  • dtime (dttm) - the date-time associated with the observation
  • origMETAR (chr) - the full METAR associated with the observation
  • TempF (dbl) - the Fahrenheit temperature associated with converting FahrC to Fahrenheit
  • DewF (dbl) - the Fahrenheit dew point associated with converting FahrC to Fahrenheit
  • modSLP (dbl) - Sea-Level Pressure (SLP), adjusted to reflect that SLP is recorded as 0-1000 but reflects data that are 950-1050
  • nSKC (int) - number of times ‘SKC’ (human-confirmed cloud-free) is recorded in the observation (should be 0 or 1)
  • nCLR (int) - number of times ‘CLR’ (austomated-sensor cloud-free) is recorded in the observation (should be 0 or 1, and should never have both nSKC>0 and nCLR>0)
  • cloudn (chr) - the nth cloud layer recorded in the METAR (layers begin with FEW, SCT, BKN, OVC or VV)
  • cTypen (chr) - the cloud type of the nth cloud layer (FEW, BKN, SCT, OVC, or VV)
  • cLeveln (dbl) - the cloud height in feet of the nth cloud layer
  • wType (fct) - highest level of obscuration recorded in the METAR (VV > OVC > BKN > SCT > FEW > CLR/SKC)
  • year (dbl) - year of the observation
  • monthint (dbl) - month of the observation as a number (e.g., 6=June)
  • month (fct) - month of the observation as three-character abbreviation, saved as a factor (e.g., Jun=June)
  • day (int) - day of the month of the observation

Base Functions Available

There are several functions available for analysis:

  • plotCountsByMetric() - bar plots for counts by variable

  • plotNumCor() - plot two numeric variables against each other

  • plotFactorNumeric() - boxplot a numeric variable against a factor variable

  • corMETAR() - correlations between METAR variables

  • lmMETAR() - linear regression modeling for METAR variables

  • basicWindPlots() - plot wind speed and direction

  • getWindDirGroup() - convert wind direction to a grouping (e.g., N for 320-360-40)

  • consolidatePlotWind() - show frequency plots of wind direction, city, and month

The tidyverse library is loaded and the 2016 Detroit data is read in to show examples of the functions:

library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.4
## v tidyr   1.0.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ---------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
kdtw_2016 <- readRDS("./RInputFiles/ProcessedMETAR/metar_kdtw_2016.rds")

A variable mapping is created to allow for better readable variable names:

varMapper <- c(WindDir="Wind Direction (degrees)", 
               WindSpeed="Wind Speed (kts)", 
               Visibility="Visibility (SM)", 
               TempC="Temperature (C)", 
               DewC="Dew Point (C)", 
               Altimeter="Altimeter (inches Hg)", 
               modSLP="Sea-Level Pressure (hPa)", 
               TempF="Temperature (F)",
               DewF="Dew Point (F)", 
               cType1="First Cloud Layer Type", 
               cLevel1="First Cloud Layer Height (ft)",
               month="Month", 
               wType="Greatest Sky Obscuration", 
               day="Day of Month"
               )

The function plotCountsByMetric() produces bar plots for counts by variable:

# Helper function for generating plots by key variables
plotcountsByMetric <- function(df, 
                               mets, 
                               title="", 
                               rotateOn=20, 
                               dropNA=TRUE, 
                               diagnose=FALSE,
                               mapper=varMapper,
                               facetOn=NULL, 
                               showCentral=FALSE
                               ) {
    
    # Function arguments
    # df: dataframe or tibble containing raw data
    # mets: character vector of variables for plotting counts
    # title: character vector for plot title
    # rotateOn: integer, x-axis labels will be rotated by 90 degrees if # categories >= rotateOn
    # dropNA: boolean for whether to drop all NA prior to plotting (recommended for avoiding warnings)
    # diagnose: boolean for whether to note in the log the number of NA observations dropped
    # mapper: named list containing mapping from variable name to well-formatted name for titles and axes
    # facetOn: a facetting variable for the supplied df (NULL for no faceting)
    # showCentral: boolean for whether to show the central tendency over-plotted on the main data
    
    # Function usage
    # 1.  By default, the function plots overall counts by metric for a given input
    # 2.  If facetOn is passed as a non-NULL, then the data in #1 will be facetted by facetOn
    # 3.  If showCentral=TRUE, then the overall mean will be plotted as a point on the main plot (only makes sense if facetOn has been selected)
    
    
    # Plot of counts by key metric
    for (x in mets) {
        # If a facetting variable is provided, need to include this in the group_by
        useVars <- x
        if (!is.null(facetOn)) { useVars <- c(facetOn, useVars) }
        dat <- df %>%
            group_by_at(vars(all_of(useVars))) %>%
            summarize(n=n())
        
        if (dropNA) {
            nOrig <- nrow(dat)
            sumOrig <- sum(dat$n)
            dat <- dat %>%
                filter_all(all_vars(!is.na(.)))
            if (diagnose & (nOrig > nrow(dat))) { 
                cat("\nDropping", 
                    nOrig-nrow(dat), 
                    "rows with", 
                    sumOrig-sum(dat$n), 
                    "observations due to NA\n"
                    )
            }
        }
        
        # Create the main plot
        p <- dat %>%
            ggplot(aes_string(x=x, y="n")) + 
            geom_col() + 
            labs(title=title,
                 subtitle=paste0("Counts By: ", mapper[x]), 
                 x=paste0(x, " - ", mapper[x]),
                 y="Count"
                 )
        # If the rotateOn criteria is exceeded, rotate the x-axis by 90 degrees
        if (nrow(dat) >= rotateOn) {
            p <- p + theme(axis.text.x=element_text(angle=90))
        }
        # If facetting has been requested, facet by the desired variable
        if (!is.null(facetOn)) {
            p <- p + facet_wrap(as.formula(paste("~", facetOn)))
        }
        # If showCentral=TRUE, add a dot plot for the overall average
        if (showCentral) {
            # Get the median number of observations by facet, or the total if facetOn=NULL
            if (is.null(facetOn)) {
                useN <- sum(dat$n)
            } else {
                useN <- dat %>%
                    group_by_at(vars(all_of(facetOn))) %>%
                    summarize(n=sum(n)) %>%
                    pull(n) %>%
                    median()
            }
            # Get the overall percentages by x
            centralData <- helperCountsByMetric(tbl=dat, ctVar=x, sumOn="n") %>%
                mutate(centralValue=nPct*useN)
            # Apply the median
            p <- p + geom_point(data=centralData, aes(y=centralValue), color="red", size=2)
        }
        # Print the plot
        print(p)
    }
}

# Example for Detroit 2016 - using WindDir, cType1, month, wType
plotcountsByMetric(kdtw_2016, 
                   mets=c("WindDir", "cType1", "month", "wType"), 
                   title="Detroit, MI (2016)"
                   )

The function plotNumCor() plots two numeric variables against one another:

# Create a function for plotting two variables against each other
plotNumCor <- function(met, 
                       var1, 
                       var2, 
                       title=NULL, 
                       subT="", 
                       dropNA=TRUE, 
                       diagnose=FALSE,
                       mapper=varMapper
                       ) {
    
    # Create the title if not passed
    if (is.null(title)) { 
        title <- paste0("Hourly Observations of ", mapper[var1], " and ", mapper[var2]) 
    }
    
    # Pull the counts by var1-var2
    dat <- met %>%
        group_by_at(vars(all_of(c(var1, var2)))) %>%
        summarize(n=n()) 
    
    # If NA requested to be excluded, remove anything with NA
    if (dropNA) {
        nOrig <- nrow(dat)
        sumOrig <- sum(dat$n)
        dat <- dat %>%
            filter_all(all_vars(!is.na(.)))
        if (diagnose) { 
            cat("\nDropping", nOrig-nrow(dat), "rows", "with", sumOrig-sum(dat$n), "observations due to NA\n")
        }
    }
    
    p <- dat %>%
        ggplot(aes_string(x=var1, y=var2)) + 
        geom_point(alpha=0.5, aes_string(size="n")) + 
        geom_smooth(method="lm", aes_string(weight="n")) + 
        labs(x=paste0(mapper[var1], " - ", var1), 
             y=paste0(mapper[var2], " - ", var2), 
             title=title, 
             subtitle=subT
             )
    
    print(p)
}

# Example for Detroit 2016 - using TempC and TempF
plotNumCor(kdtw_2016, var1="TempC", var2="TempF", subT="Detroit, MI (2016)", diagnose=TRUE)
## 
## Dropping 1 rows with 49 observations due to NA

# Example for Detroit 2016 - using TempC and DewC
plotNumCor(kdtw_2016, var1="TempC", var2="DewC", subT="Detroit, MI (2016)", diagnose=TRUE)
## 
## Dropping 1 rows with 49 observations due to NA

# Example for Detroit 2016 - using Altimeter and modSLP
plotNumCor(kdtw_2016, var1="Altimeter", var2="modSLP", subT="Detroit, MI (2016)", diagnose=TRUE)
## 
## Dropping 1 rows with 49 observations due to NA

The function plotFactorNumeric() creates box plots for a numeric variable against a factor variable:

# Updated function for plotting numeric by factor
plotFactorNumeric <- function(met, 
                              fctVar, 
                              numVar, 
                              title=NULL, 
                              subT, 
                              diagnose=TRUE,
                              showXLabel=TRUE,
                              mapper=varMapper
                              ) {
    
    # Create the title if not passed
    if (is.null(title)) { 
        title <- paste0("Hourly Observations of ", mapper[numVar], " by ", mapper[fctVar])
    }
    
    nOrig <- nrow(met)
    dat <- met %>%
        filter(!is.na(get(fctVar)), !is.na(get(numVar)))
    if (diagnose) { cat("\nRemoving", nOrig-nrow(dat), "records due to NA\n") }
    
    p <- dat %>%
        ggplot(aes_string(x=fctVar, y=numVar)) + 
        geom_boxplot(fill="lightblue") + 
        labs(title=title, 
             subtitle=subT, 
             x=ifelse(showXLabel, paste0(mapper[fctVar], " - ", fctVar), ""), 
             y=paste0(mapper[numVar], " - ", numVar)
             )
    
    print(p)
    
}

# Example for Detroit 2016 - using TempF and month
plotFactorNumeric(kdtw_2016, 
                  fctVar="month", 
                  numVar="TempF", 
                  subT="Detroit, MI (2016)", 
                  showXLabel=FALSE,
                  diagnose=TRUE
                  )
## 
## Removing 49 records due to NA

# Example for Detroit 2016 - using WindSpeed and wType
plotFactorNumeric(kdtw_2016, 
                  fctVar="wType", 
                  numVar="WindSpeed", 
                  subT="Detroit, MI (2016)", 
                  showXLabel=TRUE,
                  diagnose=TRUE
                  )
## 
## Removing 49 records due to NA

# Example for Detroit 2016 - using Visibility and wType
plotFactorNumeric(kdtw_2016, 
                  fctVar="wType", 
                  numVar="Visibility", 
                  subT="Detroit, MI (2016)", 
                  showXLabel=TRUE,
                  diagnose=TRUE
                  )
## 
## Removing 49 records due to NA

An issue previous observed where visibility 1/16SM was interpreted as 16 statutory miles has been corrected in the ‘WeatherDownloads_202005_v002’ file.

The function corMETAR() calculates correlations among numeric variables in the METAR data:

# Function to calculate, display, and plot variable correlations
corMETAR <- function(met, numVars, subT="") {

    # Keep only complete cases and report on data kept
    dfUse <- met %>%
        select_at(vars(all_of(numVars))) %>%
        filter(complete.cases(.))
    
    nU <- nrow(dfUse)
    nM <- nrow(met)
    myPct <- round(100*nU/nM, 1)
    cat("\n *** Correlations use ", nU, " complete cases (", myPct, "% of ", nM, " total) ***\n", sep="")
    
    # Create the correlation matrix
    mtxCorr <- dfUse %>%
        cor()

    # Print the correlations
    mtxCorr %>%
        round(2) %>%
        print()

    # Display a heat map
    corrplot::corrplot(mtxCorr, 
                       method="color", 
                       title=paste0("Hourly Weather Correlations\n", subT), 
                       mar=c(0, 0, 2, 0)
                       )
}

# Example for Detroit, MI 2016
coreNum <- c("TempC", "TempF", "DewC", "DewF", 
             "Altimeter", "modSLP", "WindSpeed", "Visibility"
             )
corMETAR(kdtw_2016, numVars=coreNum, subT="Detroit, MI (2016) METAR")
## 
##  *** Correlations use 8769 complete cases (99.4% of 8818 total) ***
##            TempC TempF  DewC  DewF Altimeter modSLP WindSpeed Visibility
## TempC       1.00  1.00  0.92  0.92     -0.17  -0.24     -0.10       0.14
## TempF       1.00  1.00  0.92  0.92     -0.17  -0.24     -0.10       0.14
## DewC        0.92  0.92  1.00  1.00     -0.22  -0.28     -0.18      -0.07
## DewF        0.92  0.92  1.00  1.00     -0.22  -0.28     -0.18      -0.07
## Altimeter  -0.17 -0.17 -0.22 -0.22      1.00   1.00     -0.37       0.15
## modSLP     -0.24 -0.24 -0.28 -0.28      1.00   1.00     -0.35       0.14
## WindSpeed  -0.10 -0.10 -0.18 -0.18     -0.37  -0.35      1.00       0.10
## Visibility  0.14  0.14 -0.07 -0.07      0.15   0.14      0.10       1.00

The function lmMETAR() runs simple linear regression models on the METAR data:

# Function for linear regressions on METAR data
lmMETAR <- function(met, 
                    y, 
                    x, 
                    yName, 
                    subT=""
                    ) {
    
    # Convert to formula
    myChar <- paste0(y, " ~ ", x)
    cat("\n *** Regression call is:", myChar, "***\n")
    
    # Run regression
    regr <- lm(formula(myChar), data=met)
    
    # Summarize regression
    print(summary(regr))
    
    # Predict the new values
    pred <- predict(regr, newdata=met)
    
    # Plot the predictions
    p <- met %>%
        select_at(vars(all_of(y))) %>%
        mutate(pred=pred) %>%
        filter_all(all_vars(!is.na(.))) %>%
        group_by_at(vars(all_of(c(y, "pred")))) %>%
        summarize(n=n()) %>%
        ggplot(aes_string(x=y, y="pred")) + 
        geom_point(aes(size=n), alpha=0.25) + 
        geom_smooth(aes(weight=n), method="lm") + 
        labs(title=paste0("Predicted vs. Actual ", yName, " - ", x, " as Predictor"), 
             subtitle=subT, 
             x=paste0("Actual ", yName), 
             y=paste0("Predicted ", yName)
             )
    print(p)
    
}

# Examples for Detroit, MI 2016
lmMETAR(kdtw_2016, "modSLP", "Altimeter", yName="Sea Level Pressure", subT="Detroit, MI (2016)")
## 
##  *** Regression call is: modSLP ~ Altimeter ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.96361 -0.44022 -0.03758  0.40772  1.41448 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -16.44389    0.74723  -22.01   <2e-16 ***
## Altimeter    34.41514    0.02488 1383.17   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4961 on 8767 degrees of freedom
##   (49 observations deleted due to missingness)
## Multiple R-squared:  0.9954, Adjusted R-squared:  0.9954 
## F-statistic: 1.913e+06 on 1 and 8767 DF,  p-value: < 2.2e-16

lmMETAR(kdtw_2016, "modSLP", "Altimeter + TempF", yName="Sea Level Pressure", subT="Detroit, MI (2016)")
## 
##  *** Regression call is: modSLP ~ Altimeter + TempF ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.57802 -0.12674  0.00481  0.12820  0.65708 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.713e+00  2.803e-01  -13.25   <2e-16 ***
## Altimeter    3.403e+01  9.302e-03 3658.80   <2e-16 ***
## TempF       -2.351e-02  9.941e-05 -236.48   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1826 on 8766 degrees of freedom
##   (49 observations deleted due to missingness)
## Multiple R-squared:  0.9994, Adjusted R-squared:  0.9994 
## F-statistic: 7.086e+06 on 2 and 8766 DF,  p-value: < 2.2e-16

lmMETAR(kdtw_2016, "TempC", "DewC", yName="Temperature (C)", subT="Detroit, MI (2016)")
## 
##  *** Regression call is: TempC ~ DewC ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.174 -3.172 -1.165  2.822 17.828 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.17084    0.05373   114.9   <2e-16 ***
## DewC         0.99909    0.00470   212.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.463 on 8767 degrees of freedom
##   (49 observations deleted due to missingness)
## Multiple R-squared:  0.8375, Adjusted R-squared:  0.8375 
## F-statistic: 4.519e+04 on 1 and 8767 DF,  p-value: < 2.2e-16

The basicWindPlots() function creates plots for wind speed and direction:

# Generate basic wind plots
basicWindPlots <- function(met, 
                           dirVar="WindDir", 
                           spdVar="WindSpeed",
                           desc="", 
                           gran="", 
                           mapper=varMapper
                           ) {

    # Plot for the wind direction
    wDir <- met %>%
        ggplot(aes_string(x=dirVar)) + 
        geom_bar() + 
        labs(title=paste0(desc, " Wind Direction"), subtitle=gran, 
             y="# Hourly Observations", x=mapper[dirVar]
             ) + 
        theme(axis.text.x=element_text(angle=90))
    print(wDir)

    # Plot for the minimum, average, and maximum wind speed by wind direction
    # Wind direction 000 is reserved for 0 KT wind, while VRB is reserved for 3-6 KT variable winds
    wSpeedByDir <- met %>%
        filter(!is.na(get(dirVar))) %>%
        group_by_at(vars(all_of(dirVar))) %>%
        summarize(minWind=min(get(spdVar)), meanWind=mean(get(spdVar)), maxWind=max(get(spdVar))) %>%
        ggplot(aes_string(x=dirVar)) +
        geom_point(aes(y=meanWind), color="red", size=2) +
        geom_errorbar(aes(ymin=minWind, ymax=maxWind)) +
        labs(title=paste0(desc, " Wind Speed (Max, Mean, Min) By Wind Direction"), 
             subtitle=gran,
             y=mapper[spdVar], 
             x=mapper[dirVar]
             ) + 
        theme(axis.text.x=element_text(angle=90))
    print(wSpeedByDir)

    # Plot for the wind speed
    pctZero <- sum(pull(met, spdVar)==0, na.rm=TRUE) / nrow(met)
    wSpeed <- met %>%
        filter_at(vars(all_of(spdVar)), all_vars(!is.na(.))) %>%
        ggplot(aes_string(x=spdVar)) +
        geom_bar(aes(y=..count../sum(..count..))) +
        labs(title=paste0(round(100*pctZero), "% of wind speeds in ", desc, " measure 0 Knots"),
             subtitle=gran,
             y="% Hourly Observations", 
             x=mapper[spdVar]
             )
    print(wSpeed)
    
    # Polar plot for wind speed and wind direction
    wData <- met %>%
        filter_at(vars(all_of(dirVar)), all_vars(!is.na(.) & !(. %in% c("000", "VRB")))) %>%
        filter_at(vars(all_of(spdVar)), all_vars(!is.na(.))) %>%
        mutate_at(vars(all_of(dirVar)), as.numeric) %>%
        group_by_at(vars(all_of(c(dirVar, spdVar)))) %>%
        summarize(n=n())
        
    wPolarDirSpeed <- wData %>%
        ggplot(aes_string(x=spdVar, y=dirVar)) +
        geom_point(alpha=0.1, aes(size=n)) +
        coord_polar(theta="y") +
        labs(title=paste0(desc, " Direction vs. Wind Speed"), 
             subtitle=gran, 
             x=mapper[spdVar], 
             y=mapper[dirVar]
             ) +
        scale_y_continuous(limits=c(0, 360), breaks=c(0, 90, 180, 270, 360)) +
        scale_x_continuous(limits=c(0, 40), breaks=c(0, 5, 10, 15, 20, 25, 30, 35, 40)) +
        geom_point(aes(x=0, y=0), color="red", size=2)
    print(wPolarDirSpeed)

}

# Example for Detroit, MI 2016
basicWindPlots(kdtw_2016, desc="Detroit, MI (2016)", gran="KDTW METAR")

The getWindDirGroup() function maps wind direction to a category such as NNE. Because the METAR data are recorded in units of 10 degrees, either 4 groupings (90 degrees each) or 12 groupings (30 degrees each) are preferred, so that each category has the same underlying number of buckets:

# Extract the wind direction data from a processed METAR file
getWindDirGroup <- function(met, src) {
    
    # Use the fullMETAR data and extract WindDir, WindSpeed, month
    windPlotData <- met %>%
        select(WindDir, WindSpeed, month) %>%
        mutate(windDirGroup=factor(case_when(WindSpeed==0 ~ "No Wind", 
                                             WindDir=="VRB" ~ "Variable", 
                                             WindDir %in% c("350", "360", "010") ~ "N", 
                                             WindDir %in% c("020", "030", "040") ~ "NNE", 
                                             WindDir %in% c("050", "060", "070") ~ "ENE", 
                                             WindDir %in% c("080", "090", "100") ~ "E", 
                                             WindDir %in% c("110", "120", "130") ~ "ESE",
                                             WindDir %in% c("140", "150", "160") ~ "SSE", 
                                             WindDir %in% c("170", "180", "190") ~ "S", 
                                             WindDir %in% c("200", "210", "220") ~ "SSW",
                                             WindDir %in% c("230", "240", "250") ~ "WSW", 
                                             WindDir %in% c("260", "270", "280") ~ "W", 
                                             WindDir %in% c("290", "300", "310") ~ "WNW", 
                                             WindDir %in% c("320", "330", "340") ~ "NNW", 
                                             TRUE ~ "Error"
                                             ), 
                                   levels=c("No Wind", "Variable", "Error", 
                                            "N", "NNE", "ENE", 
                                            "E", "ESE", "SSE", 
                                            "S", "SSW", "WSW", 
                                            "W", "WNW", "NNW"
                                            )
                                   )
               )
    
    # Rempve the errors and calculate percentages by month for the remainder
    processedWindData <- windPlotData %>%
        filter(windDirGroup != "Error") %>%
        group_by(month, windDirGroup) %>%
        summarize(n=n()) %>%
        ungroup() %>%
        group_by(month) %>%
        mutate(pct=n/sum(n)) %>%
        ungroup() %>%
        mutate(src=src)
    
    processedWindData

}

The function conslidatePlotWind() then calls getWindDirGroup() for any number of files:

# Consolidate and plot wind data
consolidatePlotWind <- function(files, names) {

    consFun <- function(x, y) { getWindDirGroup(met=x, src=y) }
    boundByRows <- map2_dfr(.x=files, .y=names, .f=consFun)

    # Show frequency by month for each city, faceted by wind direction
    p1 <- boundByRows %>%
        ggplot(aes(x=month, y=pct, color=src)) + 
        geom_line(aes(group=src)) + 
        facet_wrap(~windDirGroup) + 
        labs(title="Wind Frequency by Month", 
             x="Month", 
             y="Frequency of Wind Observations"
             ) +
        theme(axis.text.x=element_text(angle=90))
    print(p1)
    
    # Show frequency by wind direction for each city, faceted by month
    p2 <- boundByRows %>%
        ggplot(aes(x=windDirGroup, y=pct, color=src)) + 
        geom_line(aes(group=src)) + 
        facet_wrap(~month) + 
        labs(title="Wind Frequency by Wind Direction", 
             x="Wind Direction", 
             y="Frequency of Wind Observations"
             ) +
        theme(axis.text.x=element_text(angle=90))
    print(p2)
    
    boundByRows
    
}

# Load the Las Vegas data and New Orleans data for comparison
kmsy_2016 <- readRDS("./RInputFiles/ProcessedMETAR/metar_kmsy_2016.rds")
klas_2016 <- readRDS("./RInputFiles/ProcessedMETAR/metar_klas_2016.rds")

# Run wind by month comparisons for Detroit, Las Vegas, New Orleans
consolidatePlotWind(files=list(kdtw_2016, klas_2016, kmsy_2016), 
                    names=c("Detroit, MI (2016)", "Las Vegas, NV (2016)", "New Orleans, LA (2016)")
                    )

## # A tibble: 504 x 5
##    month windDirGroup     n     pct src               
##    <fct> <fct>        <int>   <dbl> <chr>             
##  1 Jan   No Wind         46 0.0600  Detroit, MI (2016)
##  2 Jan   Variable         3 0.00391 Detroit, MI (2016)
##  3 Jan   N               26 0.0339  Detroit, MI (2016)
##  4 Jan   NNE             26 0.0339  Detroit, MI (2016)
##  5 Jan   ENE              9 0.0117  Detroit, MI (2016)
##  6 Jan   E               16 0.0209  Detroit, MI (2016)
##  7 Jan   ESE             14 0.0183  Detroit, MI (2016)
##  8 Jan   SSE             72 0.0939  Detroit, MI (2016)
##  9 Jan   S              131 0.171   Detroit, MI (2016)
## 10 Jan   SSW            158 0.206   Detroit, MI (2016)
## # ... with 494 more rows

Combining Functions

The functions can be combined so that a full process can be run for a given file:

# File name to city name mapper
cityNameMapper <- c(kdtw_2016="Detroit, MI (2016)", 
                    kewr_2016="Newark, NJ (2016)",
                    kgrb_2016="Green Bay, WI (2016)",
                    kgrr_2016="Grand Rapids, MI (2016)",
                    kiah_2016="Houston, TX (2016)",
                    kind_2016="Indianapolis, IN (2016)",
                    klas_2015="Las Vegas, NV (2015)",
                    klas_2016="Las Vegas, NV (2016)", 
                    klas_2017="Las Vegas, NV (2017)", 
                    klnk_2016="Lincoln, NE (2016)",
                    kmke_2016="Milwaukee, WI (2016)",
                    kmsn_2016="Madison, WI (2016)",
                    kmsp_2016="Minneapolis, MN (2016)",
                    kmsy_2015="New Orleans, LA (2015)",
                    kmsy_2016="New Orleans, LA (2016)", 
                    kmsy_2017="New Orleans, LA (2017)", 
                    kord_2015="Chicago, IL (2015)",
                    kord_2016="Chicago, IL (2016)", 
                    kord_2017="Chicago, IL (2017)", 
                    ksan_2015="San Diego, CA (2015)",
                    ksan_2016="San Diego, CA (2016)",
                    ksan_2017="San Diego, CA (2017)",
                    ktvc_2016="Traverse City, MI (2016)"
                    )

# This is a helper function to create a locale description
getLocaleDescription <- function(x, mapper=cityNameMapper) {
    
    # Initialize the description as NULL
    desc <- NULL
    
    for (potMatch in names(mapper)) {
        if (str_detect(string=x, pattern=potMatch)) {
            desc <- mapper[potMatch]
            break
        }
    }
    
    # If the mapping failed, use UNMAPPED_x as the description
    if (is.null(desc)) {
        desc <- paste0("UNMAPPED_", x)
        cat("\nUnable to find a description, will use ", desc, "\n\n", sep="")
    } else {
        cat("\nWill use ", desc, " as the description for ", x, "\n\n", sep="")
    }
    
    # Return the descriptive name
    desc
    
}

# The following function runs the functions that work on a single data source
combinedEDA <- function(filename=NULL, 
                        tbl=NULL,
                        desc=NULL,
                        mets=c("WindDir", "WindSpeed", "TempC", "DewC", "Altimeter", 
                               "modSLP", "cType1", "cLevel1", "month", "day"
                               ),
                        corPairs=list(c("TempC", "TempF"), 
                                      c("TempC", "DewC"), 
                                      c("Altimeter", "modSLP"), 
                                      c("Altimeter", "WindSpeed")
                                      ),
                        fctPairs=list(c("month", "TempF"), 
                                      c("month", "DewF"), 
                                      c("month", "WindSpeed"), 
                                      c("month", "Altimeter"), 
                                      c("wType", "Visibility"), 
                                      c("wType", "WindSpeed"), 
                                      c("WindDir", "WindSpeed"), 
                                      c("WindDir", "TempF")
                                      ),
                        heatVars=c("TempC", "TempF", 
                                   "DewC", "DewF", 
                                   "Altimeter", "modSLP", 
                                   "WindSpeed", "Visibility", 
                                   "monthint", "day"
                                   ),
                        lmVars=list(c("modSLP", "Altimeter"), 
                                    c("modSLP", "Altimeter + TempF"), 
                                    c("TempF", "DewF"), 
                                    c("WindSpeed", "Altimeter + TempF + DewF + month")
                                    ),
                        mapVariables=varMapper,
                        mapFileNames=cityNameMapper,
                        path="./RInputFiles/ProcessedMETAR/"
                        ) {
    
    # Require that either filename OR tbl be passed
    if (is.null(filename) & is.null(tbl)) {
        cat("\nMust provide either a filename or an already-loaded tibble\n")
        stop("\nfilename=NULL and tbl=NULL may not both be passed to combinedEDA()\n")
    }
    
    # Require that either 1) filename and mapFileNames, OR 2) desc be passed
    if ((is.null(filename) | is.null(mapFileNames)) & is.null(desc)) {
        cat("\nMust provide either a filename with mapFileNames or a file description\n")
        stop("\nWhen desc=NULL must have non-null entries for both filename= and mapFileNames=\n")
    }
    
    # Find the description if it is NULL (default)
    if (is.null(desc)) {
        desc <- getLocaleDescription(filename, mapper=mapFileNames)
    }
    
    # Warn if both filename and tbl are passed, since tbl will be used
    if (!is.null(filename) & !is.null(tbl)) {
        cat("\nA tibble has been passed and will be used as the dataset for this function\n")
        warning("\nArgument filename=", filename, " is NOT loaded since a tibble was passed\n")
    }
    
    # Read in the file unless tbl has already been passed to the routine
    if (is.null(tbl)) {
        tbl <- readRDS(paste0(path, filename))
    }
    
    # Plot counts by metric
    plotcountsByMetric(tbl, mets=mets, title=desc, diagnose=TRUE)
    
    # Plot relationships between two variables
    for (ys in corPairs) {
        plotNumCor(tbl, var1=ys[1], var2=ys[2], subT=desc, diagnose=TRUE)
    }
    
    # plot numeric vs. factor
    for (ys in fctPairs) {
        plotFactorNumeric(tbl, fctVar=ys[1], numVar=ys[2], subT=desc, showXLabel=FALSE, diagnose=TRUE)
    }
    
    # Heatmap for variable correlations
    corMETAR(tbl, numVars=heatVars, subT=paste0(desc, " METAR"))

    # Run linear rergression
    for (ys in lmVars) {
        lmMETAR(tbl, y=ys[1], x=ys[2], yName=varMapper[ys[1]], subT=desc)
    }
    
    # Run the basic wind plots
    basicWindPlots(tbl, desc=desc, gran="")
    
    # Return the tibble
    tbl
    
}

The process can then be run for each of Detroit (2016), Las Vegas (2016), and New Orleans (2016):

# Retrieve the Detroit, MI (2016) data
kdtw_2016 <- combinedEDA("metar_kdtw_2016.rds")
## 
## Will use Detroit, MI (2016) as the description for metar_kdtw_2016.rds
## 
## 
## Dropping 1 rows with 49 observations due to NA

## 
## Dropping 1 rows with 49 observations due to NA

## 
## Dropping 1 rows with 49 observations due to NA

## 
## Dropping 1 rows with 49 observations due to NA

## 
## Dropping 1 rows with 49 observations due to NA

## 
## Dropping 1 rows with 49 observations due to NA

## 
## Dropping 1 rows with 651 observations due to NA

## 
## Dropping 1 rows with 49 observations due to NA

## 
## Dropping 1 rows with 49 observations due to NA

## 
## Dropping 1 rows with 49 observations due to NA

## 
## Dropping 1 rows with 49 observations due to NA

## 
## Removing 49 records due to NA

## 
## Removing 49 records due to NA

## 
## Removing 49 records due to NA

## 
## Removing 49 records due to NA

## 
## Removing 49 records due to NA

## 
## Removing 49 records due to NA

## 
## Removing 49 records due to NA

## 
## Removing 49 records due to NA

## 
##  *** Correlations use 8769 complete cases (99.4% of 8818 total) ***
##            TempC TempF  DewC  DewF Altimeter modSLP WindSpeed Visibility
## TempC       1.00  1.00  0.92  0.92     -0.17  -0.24     -0.10       0.14
## TempF       1.00  1.00  0.92  0.92     -0.17  -0.24     -0.10       0.14
## DewC        0.92  0.92  1.00  1.00     -0.22  -0.28     -0.18      -0.07
## DewF        0.92  0.92  1.00  1.00     -0.22  -0.28     -0.18      -0.07
## Altimeter  -0.17 -0.17 -0.22 -0.22      1.00   1.00     -0.37       0.15
## modSLP     -0.24 -0.24 -0.28 -0.28      1.00   1.00     -0.35       0.14
## WindSpeed  -0.10 -0.10 -0.18 -0.18     -0.37  -0.35      1.00       0.10
## Visibility  0.14  0.14 -0.07 -0.07      0.15   0.14      0.10       1.00
## monthint    0.24  0.24  0.32  0.32      0.19   0.17     -0.06      -0.11
## day         0.04  0.04  0.03  0.03     -0.02  -0.02      0.06       0.04
##            monthint   day
## TempC          0.24  0.04
## TempF          0.24  0.04
## DewC           0.32  0.03
## DewF           0.32  0.03
## Altimeter      0.19 -0.02
## modSLP         0.17 -0.02
## WindSpeed     -0.06  0.06
## Visibility    -0.11  0.04
## monthint       1.00  0.02
## day            0.02  1.00

## 
##  *** Regression call is: modSLP ~ Altimeter ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.96361 -0.44022 -0.03758  0.40772  1.41448 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -16.44389    0.74723  -22.01   <2e-16 ***
## Altimeter    34.41514    0.02488 1383.17   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4961 on 8767 degrees of freedom
##   (49 observations deleted due to missingness)
## Multiple R-squared:  0.9954, Adjusted R-squared:  0.9954 
## F-statistic: 1.913e+06 on 1 and 8767 DF,  p-value: < 2.2e-16

## 
##  *** Regression call is: modSLP ~ Altimeter + TempF ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.57802 -0.12674  0.00481  0.12820  0.65708 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.713e+00  2.803e-01  -13.25   <2e-16 ***
## Altimeter    3.403e+01  9.302e-03 3658.80   <2e-16 ***
## TempF       -2.351e-02  9.941e-05 -236.48   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1826 on 8766 degrees of freedom
##   (49 observations deleted due to missingness)
## Multiple R-squared:  0.9994, Adjusted R-squared:  0.9994 
## F-statistic: 7.086e+06 on 2 and 8766 DF,  p-value: < 2.2e-16

## 
##  *** Regression call is: TempF ~ DewF ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.132  -6.065  -2.083   4.042  31.029 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.060663   0.211976   52.18   <2e-16 ***
## DewF         1.000977   0.004677  214.02   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.987 on 8767 degrees of freedom
##   (49 observations deleted due to missingness)
## Multiple R-squared:  0.8393, Adjusted R-squared:  0.8393 
## F-statistic: 4.58e+04 on 1 and 8767 DF,  p-value: < 2.2e-16

## 
##  *** Regression call is: WindSpeed ~ Altimeter + TempF + DewF + month ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.3544  -2.5454  -0.1494   2.3669  20.6042 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 306.827720   6.726512  45.615  < 2e-16 ***
## Altimeter    -9.980907   0.222840 -44.790  < 2e-16 ***
## TempF         0.206384   0.006044  34.145  < 2e-16 ***
## DewF         -0.217533   0.006755 -32.204  < 2e-16 ***
## monthFeb      0.188758   0.204057   0.925   0.3550    
## monthMar     -1.375926   0.211244  -6.513 7.75e-11 ***
## monthApr     -2.406187   0.217870 -11.044  < 2e-16 ***
## monthMay     -3.616016   0.248432 -14.555  < 2e-16 ***
## monthJun     -4.337974   0.277532 -15.631  < 2e-16 ***
## monthJul     -3.428235   0.300872 -11.394  < 2e-16 ***
## monthAug     -2.971582   0.312087  -9.522  < 2e-16 ***
## monthSep     -1.839206   0.292614  -6.285 3.43e-10 ***
## monthOct     -0.477868   0.255640  -1.869   0.0616 .  
## monthNov     -0.013008   0.226830  -0.057   0.9543    
## monthDec      2.071836   0.200848  10.315  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.883 on 8754 degrees of freedom
##   (49 observations deleted due to missingness)
## Multiple R-squared:  0.3218, Adjusted R-squared:  0.3207 
## F-statistic: 296.7 on 14 and 8754 DF,  p-value: < 2.2e-16

# Retrieve the Las Vegas, NV (2016) data
klas_2016 <- combinedEDA("metar_klas_2016.rds")
## 
## Will use Las Vegas, NV (2016) as the description for metar_klas_2016.rds
## 
## 
## Dropping 1 rows with 35 observations due to NA

## 
## Dropping 1 rows with 35 observations due to NA

## 
## Dropping 1 rows with 35 observations due to NA

## 
## Dropping 1 rows with 35 observations due to NA

## 
## Dropping 1 rows with 35 observations due to NA

## 
## Dropping 1 rows with 35 observations due to NA

## 
## Dropping 1 rows with 2440 observations due to NA

## 
## Dropping 1 rows with 35 observations due to NA

## 
## Dropping 1 rows with 35 observations due to NA

## 
## Dropping 1 rows with 35 observations due to NA

## 
## Dropping 1 rows with 35 observations due to NA

## 
## Removing 35 records due to NA

## 
## Removing 35 records due to NA

## 
## Removing 35 records due to NA

## 
## Removing 35 records due to NA

## 
## Removing 35 records due to NA

## 
## Removing 35 records due to NA

## 
## Removing 35 records due to NA

## 
## Removing 35 records due to NA

## 
##  *** Correlations use 8783 complete cases (99.6% of 8818 total) ***
##            TempC TempF  DewC  DewF Altimeter modSLP WindSpeed Visibility
## TempC       1.00  1.00  0.22  0.22     -0.51  -0.62      0.22       0.01
## TempF       1.00  1.00  0.22  0.22     -0.51  -0.62      0.22       0.01
## DewC        0.22  0.22  1.00  1.00     -0.24  -0.27     -0.04      -0.13
## DewF        0.22  0.22  1.00  1.00     -0.24  -0.27     -0.04      -0.13
## Altimeter  -0.51 -0.51 -0.24 -0.24      1.00   0.99     -0.38       0.06
## modSLP     -0.62 -0.62 -0.27 -0.27      0.99   1.00     -0.38       0.06
## WindSpeed   0.22  0.22 -0.04 -0.04     -0.38  -0.38      1.00      -0.02
## Visibility  0.01  0.01 -0.13 -0.13      0.06   0.06     -0.02       1.00
## monthint    0.14  0.14  0.06  0.06      0.06   0.03     -0.01      -0.02
## day        -0.01 -0.01  0.03  0.03     -0.01  -0.01      0.00      -0.07
##            monthint   day
## TempC          0.14 -0.01
## TempF          0.14 -0.01
## DewC           0.06  0.03
## DewF           0.06  0.03
## Altimeter      0.06 -0.01
## modSLP         0.03 -0.01
## WindSpeed     -0.01  0.00
## Visibility    -0.02 -0.07
## monthint       1.00  0.02
## day            0.02  1.00

## 
##  *** Regression call is: modSLP ~ Altimeter ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1381 -0.7848 -0.0607  0.6891  3.6362 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -116.30984    1.72975  -67.24   <2e-16 ***
## Altimeter     37.68826    0.05776  652.51   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9753 on 8781 degrees of freedom
##   (35 observations deleted due to missingness)
## Multiple R-squared:  0.9798, Adjusted R-squared:  0.9798 
## F-statistic: 4.258e+05 on 1 and 8781 DF,  p-value: < 2.2e-16

## 
##  *** Regression call is: modSLP ~ Altimeter + TempF ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.18161 -0.33113  0.02395  0.33395  1.08509 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.537e+01  8.693e-01  -29.18   <2e-16 ***
## Altimeter    3.478e+01  2.868e-02 1212.89   <2e-16 ***
## TempF       -5.581e-02  2.813e-04 -198.42   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4165 on 8780 degrees of freedom
##   (35 observations deleted due to missingness)
## Multiple R-squared:  0.9963, Adjusted R-squared:  0.9963 
## F-statistic: 1.187e+06 on 2 and 8780 DF,  p-value: < 2.2e-16

## 
##  *** Regression call is: TempF ~ DewF ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -38.026 -14.381  -0.341  13.299  46.312 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 62.12523    0.49251  126.14   <2e-16 ***
## DewF         0.32810    0.01576   20.82   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.94 on 8781 degrees of freedom
##   (35 observations deleted due to missingness)
## Multiple R-squared:  0.04705,    Adjusted R-squared:  0.04694 
## F-statistic: 433.5 on 1 and 8781 DF,  p-value: < 2.2e-16

## 
##  *** Regression call is: WindSpeed ~ Altimeter + TempF + DewF + month ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.1199  -2.7665  -0.6134   2.0975  22.2630 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 316.166903   9.583584  32.990  < 2e-16 ***
## Altimeter   -10.363527   0.316290 -32.766  < 2e-16 ***
## TempF         0.012196   0.005239   2.328 0.019933 *  
## DewF         -0.053040   0.004107 -12.915  < 2e-16 ***
## monthFeb      1.477528   0.225729   6.546 6.26e-11 ***
## monthMar      0.848312   0.232319   3.652 0.000262 ***
## monthApr      1.590035   0.242532   6.556 5.84e-11 ***
## monthMay      0.327608   0.261841   1.251 0.210905    
## monthJun      0.141204   0.319290   0.442 0.658325    
## monthJul      1.092983   0.328362   3.329 0.000876 ***
## monthAug      0.545515   0.319165   1.709 0.087450 .  
## monthSep      0.647423   0.281377   2.301 0.021420 *  
## monthOct      1.147640   0.253286   4.531 5.95e-06 ***
## monthNov      1.097005   0.227198   4.828 1.40e-06 ***
## monthDec      0.672432   0.214810   3.130 0.001752 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.181 on 8768 degrees of freedom
##   (35 observations deleted due to missingness)
## Multiple R-squared:  0.1774, Adjusted R-squared:  0.176 
## F-statistic:   135 on 14 and 8768 DF,  p-value: < 2.2e-16

# Retrieve the New Orleans, LA (2016) data
kmsy_2016 <- combinedEDA("metar_kmsy_2016.rds")
## 
## Will use New Orleans, LA (2016) as the description for metar_kmsy_2016.rds
## 
## 
## Dropping 1 rows with 14 observations due to NA

## 
## Dropping 1 rows with 14 observations due to NA

## 
## Dropping 1 rows with 14 observations due to NA

## 
## Dropping 1 rows with 14 observations due to NA

## 
## Dropping 1 rows with 14 observations due to NA

## 
## Dropping 1 rows with 14 observations due to NA

## 
## Dropping 1 rows with 1029 observations due to NA

## 
## Dropping 1 rows with 14 observations due to NA

## 
## Dropping 1 rows with 14 observations due to NA

## 
## Dropping 1 rows with 14 observations due to NA

## 
## Dropping 1 rows with 14 observations due to NA

## 
## Removing 14 records due to NA

## 
## Removing 14 records due to NA

## 
## Removing 14 records due to NA

## 
## Removing 14 records due to NA

## 
## Removing 14 records due to NA

## 
## Removing 14 records due to NA

## 
## Removing 14 records due to NA

## 
## Removing 14 records due to NA

## 
##  *** Correlations use 8799 complete cases (99.8% of 8813 total) ***
##            TempC TempF  DewC  DewF Altimeter modSLP WindSpeed Visibility
## TempC       1.00  1.00  0.85  0.85     -0.48  -0.47     -0.09       0.13
## TempF       1.00  1.00  0.85  0.85     -0.48  -0.48     -0.09       0.13
## DewC        0.85  0.85  1.00  1.00     -0.56  -0.56     -0.21      -0.06
## DewF        0.85  0.85  1.00  1.00     -0.57  -0.56     -0.21      -0.05
## Altimeter  -0.48 -0.48 -0.56 -0.57      1.00   1.00     -0.08       0.08
## modSLP     -0.47 -0.48 -0.56 -0.56      1.00   1.00     -0.08       0.08
## WindSpeed  -0.09 -0.09 -0.21 -0.21     -0.08  -0.08      1.00       0.03
## Visibility  0.13  0.13 -0.06 -0.05      0.08   0.08      0.03       1.00
## monthint    0.26  0.26  0.31  0.31      0.09   0.09     -0.22      -0.02
## day         0.03  0.03  0.07  0.07      0.02   0.02     -0.05       0.01
##            monthint   day
## TempC          0.26  0.03
## TempF          0.26  0.03
## DewC           0.31  0.07
## DewF           0.31  0.07
## Altimeter      0.09  0.02
## modSLP         0.09  0.02
## WindSpeed     -0.22 -0.05
## Visibility    -0.02  0.01
## monthint       1.00  0.02
## day            0.02  1.00

## 
##  *** Regression call is: modSLP ~ Altimeter ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.232515 -0.083700  0.000544  0.084787  0.227320 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.773456   0.230067   12.05   <2e-16 ***
## Altimeter   33.779607   0.007654 4413.27   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1015 on 8797 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:  0.9995, Adjusted R-squared:  0.9995 
## F-statistic: 1.948e+07 on 1 and 8797 DF,  p-value: < 2.2e-16

## 
##  *** Regression call is: modSLP ~ Altimeter + TempF ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.222066 -0.082359 -0.001036  0.083533  0.217460 
## 
## Coefficients:
##              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) 1.557e+00  2.638e-01    5.903  3.7e-09 ***
## Altimeter   3.382e+01  8.667e-03 3902.029  < 2e-16 ***
## TempF       8.751e-04  9.432e-05    9.277  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.101 on 8796 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:  0.9996, Adjusted R-squared:  0.9996 
## F-statistic: 9.833e+06 on 2 and 8796 DF,  p-value: < 2.2e-16

## 
##  *** Regression call is: TempF ~ DewF ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.051  -4.777  -1.265   4.199  26.477 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 24.440028   0.319858   76.41   <2e-16 ***
## DewF         0.778877   0.005065  153.77   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.763 on 8797 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:  0.7288, Adjusted R-squared:  0.7288 
## F-statistic: 2.364e+04 on 1 and 8797 DF,  p-value: < 2.2e-16

## 
##  *** Regression call is: WindSpeed ~ Altimeter + TempF + DewF + month ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.9452  -2.5710  -0.2247   2.2592  18.9024 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 190.986029  11.812622  16.168  < 2e-16 ***
## Altimeter    -6.223347   0.387357 -16.066  < 2e-16 ***
## TempF         0.225378   0.007290  30.915  < 2e-16 ***
## DewF         -0.169883   0.006513 -26.082  < 2e-16 ***
## monthFeb     -0.787460   0.204817  -3.845 0.000122 ***
## monthMar     -0.270554   0.215758  -1.254 0.209885    
## monthApr     -1.941174   0.225642  -8.603  < 2e-16 ***
## monthMay     -3.763407   0.241926 -15.556  < 2e-16 ***
## monthJun     -5.215285   0.269852 -19.326  < 2e-16 ***
## monthJul     -5.568010   0.283515 -19.639  < 2e-16 ***
## monthAug     -4.834769   0.277122 -17.446  < 2e-16 ***
## monthSep     -5.769099   0.272789 -21.149  < 2e-16 ***
## monthOct     -4.812814   0.245841 -19.577  < 2e-16 ***
## monthNov     -2.827680   0.220138 -12.845  < 2e-16 ***
## monthDec     -0.203742   0.210552  -0.968 0.333241    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.847 on 8784 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:  0.2278, Adjusted R-squared:  0.2266 
## F-statistic: 185.1 on 14 and 8784 DF,  p-value: < 2.2e-16

Directing Output to a Separate File

There is a lot of good information in the EDA, but it can be overwhelming to have everything in one place. Perhaps a wrapper function can be built allowing for outputs to be piped to a given file:

wrapCombinedEDA <- function(readFile, 
                            readPath="./RInputFiles/ProcessedMETAR/", 
                            mapFileNames=cityNameMapper,
                            desc=NULL,
                            writeLogFile=NULL,
                            writeLogPDF=NULL,
                            writeLogPath=NULL,
                            appendWriteFile=FALSE,
                            ...
                            ) {
    
    # Read in the requested file
    tbl <- readRDS(paste0(readPath, readFile))

    # Find the description if it has not been passed
    if (is.null(desc)) {
        desc <- getLocaleDescription(readFile, mapper=mapFileNames)
    }
    
    # Helper function that only runs the combinedEDA() routine
    coreFunc <- function() { combinedEDA(tbl=tbl, desc=desc, mapFileNames=mapFileNames, ...) }
    
    # If writeLogPDF is not NULL, direct the graphs to a suitable PDF
    if (!is.null(writeLogPDF)) {
        
        # Prepend the provided log path if it has not been made available
        if (!is.null(writeLogPath)) {
            writeLogPDF <- paste0(writeLogPath, writeLogPDF)
        }
        
        # Provide the location of the EDA pdf file
        cat("\nEDA PDF file is available at:", writeLogPDF, "\n")

        # Redirect the writing to writeLogPDF
        pdf(writeLogPDF)
    }
    
    # Run EDA on the tbl using capture.output to redirect to a log file if specified
    if (!is.null(writeLogFile)) {
        
        # Prepend the provided log path if it has not been made available
        if (!is.null(writeLogPath)) {
            writeLogFile <- paste0(writeLogPath, writeLogFile)
        }
        
        # Provide the location of the EDA log file
        cat("\nEDA log file is available at:", writeLogFile, "\n")
        
        # Run EDA such that the output goes to the log file
        capture.output(coreFunc(), 
                       file=writeLogFile, 
                       append=appendWriteFile
                       )
        
    } else {
        # Run EDA such that output stays in stdout
        coreFunc()
    }
    
    # If writeLogPDF is not NULL, redirect to stdout
    if (!is.null(writeLogPDF)) {
        dev.off()
    }
    
    # Return the tbl
    tbl
    
}
filePath <- "./RInputFiles/ProcessedMETAR/"

# Example for the basic function for Chicago, IL (2016) written to stdout
kord_2016 <- wrapCombinedEDA("metar_kord_2016.rds", readPath=filePath)
## 
## Will use Chicago, IL (2016) as the description for metar_kord_2016.rds
## 
## 
## Dropping 1 rows with 10 observations due to NA

## 
## Dropping 1 rows with 10 observations due to NA

## 
## Dropping 1 rows with 10 observations due to NA

## 
## Dropping 1 rows with 10 observations due to NA

## 
## Dropping 1 rows with 10 observations due to NA

## 
## Dropping 1 rows with 10 observations due to NA

## 
## Dropping 1 rows with 823 observations due to NA

## 
## Dropping 1 rows with 10 observations due to NA

## 
## Dropping 1 rows with 10 observations due to NA

## 
## Dropping 1 rows with 10 observations due to NA

## 
## Dropping 1 rows with 10 observations due to NA

## 
## Removing 10 records due to NA

## 
## Removing 10 records due to NA

## 
## Removing 10 records due to NA

## 
## Removing 10 records due to NA

## 
## Removing 10 records due to NA

## 
## Removing 10 records due to NA

## 
## Removing 10 records due to NA

## 
## Removing 10 records due to NA

## 
##  *** Correlations use 8805 complete cases (99.9% of 8815 total) ***
##            TempC TempF  DewC  DewF Altimeter modSLP WindSpeed Visibility
## TempC       1.00  1.00  0.93  0.93     -0.22  -0.29     -0.12       0.19
## TempF       1.00  1.00  0.93  0.93     -0.22  -0.29     -0.12       0.19
## DewC        0.93  0.93  1.00  1.00     -0.27  -0.34     -0.19       0.05
## DewF        0.93  0.93  1.00  1.00     -0.28  -0.34     -0.19       0.05
## Altimeter  -0.22 -0.22 -0.27 -0.28      1.00   1.00     -0.31       0.19
## modSLP     -0.29 -0.29 -0.34 -0.34      1.00   1.00     -0.29       0.17
## WindSpeed  -0.12 -0.12 -0.19 -0.19     -0.31  -0.29      1.00       0.01
## Visibility  0.19  0.19  0.05  0.05      0.19   0.17      0.01       1.00
## monthint    0.24  0.24  0.27  0.27      0.18   0.16     -0.09       0.02
## day         0.05  0.05  0.05  0.05     -0.06  -0.06      0.08       0.04
##            monthint   day
## TempC          0.24  0.05
## TempF          0.24  0.05
## DewC           0.27  0.05
## DewF           0.27  0.05
## Altimeter      0.18 -0.06
## modSLP         0.16 -0.06
## WindSpeed     -0.09  0.08
## Visibility     0.02  0.04
## monthint       1.00  0.02
## day            0.02  1.00

## 
##  *** Regression call is: modSLP ~ Altimeter ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.31041 -0.47561 -0.07852  0.43888  1.66927 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -23.11596    0.85978  -26.89   <2e-16 ***
## Altimeter    34.63779    0.02864 1209.37   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5573 on 8803 degrees of freedom
##   (10 observations deleted due to missingness)
## Multiple R-squared:  0.994,  Adjusted R-squared:  0.994 
## F-statistic: 1.463e+06 on 1 and 8803 DF,  p-value: < 2.2e-16

## 
##  *** Regression call is: modSLP ~ Altimeter + TempF ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.69460 -0.12269  0.00055  0.12142  0.73628 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.603e+00  2.864e-01  -16.07   <2e-16 ***
## Altimeter    3.407e+01  9.502e-03 3585.30   <2e-16 ***
## TempF       -2.606e-02  9.503e-05 -274.26   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1804 on 8802 degrees of freedom
##   (10 observations deleted due to missingness)
## Multiple R-squared:  0.9994, Adjusted R-squared:  0.9994 
## F-statistic: 7.018e+06 on 2 and 8802 DF,  p-value: < 2.2e-16

## 
##  *** Regression call is: TempF ~ DewF ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.638  -5.485  -1.570   3.601  35.456 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.317877   0.188557   54.72   <2e-16 ***
## DewF         1.004504   0.004095  245.33   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.407 on 8803 degrees of freedom
##   (10 observations deleted due to missingness)
## Multiple R-squared:  0.8724, Adjusted R-squared:  0.8724 
## F-statistic: 6.019e+04 on 1 and 8803 DF,  p-value: < 2.2e-16

## 
##  *** Regression call is: WindSpeed ~ Altimeter + TempF + DewF + month ***
## 
## Call:
## lm(formula = formula(myChar), data = met)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.295  -2.635  -0.157   2.494  21.356 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 291.341063   7.156664  40.709  < 2e-16 ***
## Altimeter    -9.415029   0.237118 -39.706  < 2e-16 ***
## TempF         0.160498   0.006445  24.904  < 2e-16 ***
## DewF         -0.205407   0.006912 -29.718  < 2e-16 ***
## monthFeb      0.649436   0.210063   3.092 0.001997 ** 
## monthMar      0.033603   0.220448   0.152 0.878850    
## monthApr      0.092196   0.230608   0.400 0.689317    
## monthMay     -0.859361   0.258300  -3.327 0.000882 ***
## monthJun     -1.564544   0.296635  -5.274 1.36e-07 ***
## monthJul     -1.191767   0.314437  -3.790 0.000152 ***
## monthAug     -1.319054   0.319647  -4.127 3.72e-05 ***
## monthSep      0.394263   0.299972   1.314 0.188768    
## monthOct      0.511074   0.264003   1.936 0.052916 .  
## monthNov      0.050679   0.235873   0.215 0.829882    
## monthDec      1.741405   0.205243   8.485  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.984 on 8790 degrees of freedom
##   (10 observations deleted due to missingness)
## Multiple R-squared:  0.2494, Adjusted R-squared:  0.2482 
## F-statistic: 208.6 on 14 and 8790 DF,  p-value: < 2.2e-16

# Example for the basic function for San Diego, CA (2016) written to 'metar_ksan_2016_EDA.log'
ksan_2016 <- wrapCombinedEDA("metar_ksan_2016.rds", 
                             readPath=filePath, 
                             writeLogFile='metar_ksan_2016_EDA.log',
                             writeLogPDF='metar_ksan_2016_EDA.pdf', 
                             writeLogPath=filePath
                             )
## 
## Will use San Diego, CA (2016) as the description for metar_ksan_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2016_EDA.log

Finally, a function can be called to create the inputs to wrapCombinedEDA() for lower typing:

logAndPDFCombinedEDA <- function(tblName, filePath="./RInputFiles/ProcessedMETAR/") {
    
    # Create the RDS file name
    rdsName <- paste0("metar_", tblName, ".rds")
    cat("\nRDS Name:", rdsName)
    
    # Create the log file name
    logName <- paste0("metar_", tblName, "_EDA.log")
    cat("\nLog Name:", logName)
    
    # Create the PDF file name
    pdfName <- paste0("metar_", tblName, "_EDA.pdf")
    cat("\nPDF Name:", pdfName)
    
    # Call wrapCombinedEDA()
    tbl <- wrapCombinedEDA(rdsName, 
                           readPath=filePath, 
                           writeLogFile=logName, 
                           writeLogPDF=pdfName,
                           writeLogPath=filePath
                           )
    
    # Return the tbl
    tbl
    
}

This function can then be run for all of the relevant files (cached to avoid multiple runs):

# Run for 2016 only for kdtw, kewr, kgrb, kgrr, kiah, kind, klnk, kmke, kmsn, kmsp, ktvc
kdtw_2016 <- logAndPDFCombinedEDA("kdtw_2016")
## 
## RDS Name: metar_kdtw_2016.rds
## Log Name: metar_kdtw_2016_EDA.log
## PDF Name: metar_kdtw_2016_EDA.pdf
## Will use Detroit, MI (2016) as the description for metar_kdtw_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kdtw_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kdtw_2016_EDA.log
kewr_2016 <- logAndPDFCombinedEDA("kewr_2016")
## 
## RDS Name: metar_kewr_2016.rds
## Log Name: metar_kewr_2016_EDA.log
## PDF Name: metar_kewr_2016_EDA.pdf
## Will use Newark, NJ (2016) as the description for metar_kewr_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kewr_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kewr_2016_EDA.log
kgrb_2016 <- logAndPDFCombinedEDA("kgrb_2016")
## 
## RDS Name: metar_kgrb_2016.rds
## Log Name: metar_kgrb_2016_EDA.log
## PDF Name: metar_kgrb_2016_EDA.pdf
## Will use Green Bay, WI (2016) as the description for metar_kgrb_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kgrb_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kgrb_2016_EDA.log
kgrr_2016 <- logAndPDFCombinedEDA("kgrr_2016")
## 
## RDS Name: metar_kgrr_2016.rds
## Log Name: metar_kgrr_2016_EDA.log
## PDF Name: metar_kgrr_2016_EDA.pdf
## Will use Grand Rapids, MI (2016) as the description for metar_kgrr_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kgrr_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kgrr_2016_EDA.log
kiah_2016 <- logAndPDFCombinedEDA("kiah_2016")
## 
## RDS Name: metar_kiah_2016.rds
## Log Name: metar_kiah_2016_EDA.log
## PDF Name: metar_kiah_2016_EDA.pdf
## Will use Houston, TX (2016) as the description for metar_kiah_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kiah_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kiah_2016_EDA.log
kind_2016 <- logAndPDFCombinedEDA("kind_2016")
## 
## RDS Name: metar_kind_2016.rds
## Log Name: metar_kind_2016_EDA.log
## PDF Name: metar_kind_2016_EDA.pdf
## Will use Indianapolis, IN (2016) as the description for metar_kind_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kind_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kind_2016_EDA.log
klnk_2016 <- logAndPDFCombinedEDA("klnk_2016")
## 
## RDS Name: metar_klnk_2016.rds
## Log Name: metar_klnk_2016_EDA.log
## PDF Name: metar_klnk_2016_EDA.pdf
## Will use Lincoln, NE (2016) as the description for metar_klnk_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_klnk_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_klnk_2016_EDA.log
kmke_2016 <- logAndPDFCombinedEDA("kmke_2016")
## 
## RDS Name: metar_kmke_2016.rds
## Log Name: metar_kmke_2016_EDA.log
## PDF Name: metar_kmke_2016_EDA.pdf
## Will use Milwaukee, WI (2016) as the description for metar_kmke_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kmke_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kmke_2016_EDA.log
kmsn_2016 <- logAndPDFCombinedEDA("kmsn_2016")
## 
## RDS Name: metar_kmsn_2016.rds
## Log Name: metar_kmsn_2016_EDA.log
## PDF Name: metar_kmsn_2016_EDA.pdf
## Will use Madison, WI (2016) as the description for metar_kmsn_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsn_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsn_2016_EDA.log
kmsp_2016 <- logAndPDFCombinedEDA("kmsp_2016")
## 
## RDS Name: metar_kmsp_2016.rds
## Log Name: metar_kmsp_2016_EDA.log
## PDF Name: metar_kmsp_2016_EDA.pdf
## Will use Minneapolis, MN (2016) as the description for metar_kmsp_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsp_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsp_2016_EDA.log
ktvc_2016 <- logAndPDFCombinedEDA("ktvc_2016")
## 
## RDS Name: metar_ktvc_2016.rds
## Log Name: metar_ktvc_2016_EDA.log
## PDF Name: metar_ktvc_2016_EDA.pdf
## Will use Traverse City, MI (2016) as the description for metar_ktvc_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_ktvc_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_ktvc_2016_EDA.log
# Run for 2015-2016-2017 for klas, kmsy, kord, ksan
klas_2015 <- logAndPDFCombinedEDA("klas_2015")
## 
## RDS Name: metar_klas_2015.rds
## Log Name: metar_klas_2015_EDA.log
## PDF Name: metar_klas_2015_EDA.pdf
## Will use Las Vegas, NV (2015) as the description for metar_klas_2015.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_klas_2015_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_klas_2015_EDA.log
klas_2016 <- logAndPDFCombinedEDA("klas_2016")
## 
## RDS Name: metar_klas_2016.rds
## Log Name: metar_klas_2016_EDA.log
## PDF Name: metar_klas_2016_EDA.pdf
## Will use Las Vegas, NV (2016) as the description for metar_klas_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_klas_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_klas_2016_EDA.log
klas_2017 <- logAndPDFCombinedEDA("klas_2017")
## 
## RDS Name: metar_klas_2017.rds
## Log Name: metar_klas_2017_EDA.log
## PDF Name: metar_klas_2017_EDA.pdf
## Will use Las Vegas, NV (2017) as the description for metar_klas_2017.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_klas_2017_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_klas_2017_EDA.log
kmsy_2015 <- logAndPDFCombinedEDA("kmsy_2015")
## 
## RDS Name: metar_kmsy_2015.rds
## Log Name: metar_kmsy_2015_EDA.log
## PDF Name: metar_kmsy_2015_EDA.pdf
## Will use New Orleans, LA (2015) as the description for metar_kmsy_2015.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsy_2015_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsy_2015_EDA.log
kmsy_2016 <- logAndPDFCombinedEDA("kmsy_2016")
## 
## RDS Name: metar_kmsy_2016.rds
## Log Name: metar_kmsy_2016_EDA.log
## PDF Name: metar_kmsy_2016_EDA.pdf
## Will use New Orleans, LA (2016) as the description for metar_kmsy_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsy_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsy_2016_EDA.log
kmsy_2017 <- logAndPDFCombinedEDA("kmsy_2017")
## 
## RDS Name: metar_kmsy_2017.rds
## Log Name: metar_kmsy_2017_EDA.log
## PDF Name: metar_kmsy_2017_EDA.pdf
## Will use New Orleans, LA (2017) as the description for metar_kmsy_2017.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsy_2017_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsy_2017_EDA.log
kord_2015 <- logAndPDFCombinedEDA("kord_2015")
## 
## RDS Name: metar_kord_2015.rds
## Log Name: metar_kord_2015_EDA.log
## PDF Name: metar_kord_2015_EDA.pdf
## Will use Chicago, IL (2015) as the description for metar_kord_2015.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kord_2015_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kord_2015_EDA.log
kord_2016 <- logAndPDFCombinedEDA("kord_2016")
## 
## RDS Name: metar_kord_2016.rds
## Log Name: metar_kord_2016_EDA.log
## PDF Name: metar_kord_2016_EDA.pdf
## Will use Chicago, IL (2016) as the description for metar_kord_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kord_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kord_2016_EDA.log
kord_2017 <- logAndPDFCombinedEDA("kord_2017")
## 
## RDS Name: metar_kord_2017.rds
## Log Name: metar_kord_2017_EDA.log
## PDF Name: metar_kord_2017_EDA.pdf
## Will use Chicago, IL (2017) as the description for metar_kord_2017.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kord_2017_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kord_2017_EDA.log
ksan_2015 <- logAndPDFCombinedEDA("ksan_2015")
## 
## RDS Name: metar_ksan_2015.rds
## Log Name: metar_ksan_2015_EDA.log
## PDF Name: metar_ksan_2015_EDA.pdf
## Will use San Diego, CA (2015) as the description for metar_ksan_2015.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2015_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2015_EDA.log
ksan_2016 <- logAndPDFCombinedEDA("ksan_2016")
## 
## RDS Name: metar_ksan_2016.rds
## Log Name: metar_ksan_2016_EDA.log
## PDF Name: metar_ksan_2016_EDA.pdf
## Will use San Diego, CA (2016) as the description for metar_ksan_2016.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2016_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2016_EDA.log
ksan_2017 <- logAndPDFCombinedEDA("ksan_2017")
## 
## RDS Name: metar_ksan_2017.rds
## Log Name: metar_ksan_2017_EDA.log
## PDF Name: metar_ksan_2017_EDA.pdf
## Will use San Diego, CA (2017) as the description for metar_ksan_2017.rds
## 
## 
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2017_EDA.pdf 
## 
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2017_EDA.log

The files are now available for further analysis, with individual PDF files for plots associated with each locale.

Comparing Multiple Locales

With EDA about each locale saved to .pdf and .log files, it is interesting to investigate comparisons among the various locales. For example, how do the temperature patterns by month for a given locale compare to the overall global median temperature patterns by month?

The existing functions contain most of the code needed to perform this. The main steps to add are:

  1. Combine processed data files
  2. Adapt the base functions to create the overall mean or median, when requested by parameter
  3. Adapt the base functions to add the overall mean or median to the plot, when requested by parameter
  4. Facet the plot by locale, when requested by parameter

The first step is to combine one or more processed files, with a column added for locale:

# Combine files from a character list
combineProcessedFiles <- function(charList, mapper=cityNameMapper) {
    
    # Combine the objects represented by charList, and name the list items using charList
    listFiles <- lapply(charList, FUN=function(x) { get(x) })
    names(listFiles) <- charList
    
    # Bind rows, and add the descriptive locale name as sourceName
    tblFiles <- bind_rows(listFiles, .id="source") %>%
        mutate(sourceName=mapper[source])
    
    tblFiles
    
}

The 2016 data will be used to run the combined process:

# Grab all the data that ends in _2016
locales2016 <- ls() %>%
    grep(pattern="_2016", value=TRUE)
cat("\nLocales used will be:\n\n", paste0(locales2016, collapse="\n"), "\n\n", sep="")
## 
## Locales used will be:
## 
## kdtw_2016
## kewr_2016
## kgrb_2016
## kgrr_2016
## kiah_2016
## kind_2016
## klas_2016
## klnk_2016
## kmke_2016
## kmsn_2016
## kmsp_2016
## kmsy_2016
## kord_2016
## ksan_2016
## ktvc_2016
# Combine the 2016 data
all2016Data <- combineProcessedFiles(locales2016)

# Show counts by sourceName
all2016Data %>%
    count(source, sourceName)
## # A tibble: 15 x 3
##    source    sourceName                   n
##    <chr>     <chr>                    <int>
##  1 kdtw_2016 Detroit, MI (2016)        8818
##  2 kewr_2016 Newark, NJ (2016)         8821
##  3 kgrb_2016 Green Bay, WI (2016)      8803
##  4 kgrr_2016 Grand Rapids, MI (2016)   8812
##  5 kiah_2016 Houston, TX (2016)        8816
##  6 kind_2016 Indianapolis, IN (2016)   8767
##  7 klas_2016 Las Vegas, NV (2016)      8818
##  8 klnk_2016 Lincoln, NE (2016)        8813
##  9 kmke_2016 Milwaukee, WI (2016)      8808
## 10 kmsn_2016 Madison, WI (2016)        8798
## 11 kmsp_2016 Minneapolis, MN (2016)    8817
## 12 kmsy_2016 New Orleans, LA (2016)    8813
## 13 kord_2016 Chicago, IL (2016)        8815
## 14 ksan_2016 San Diego, CA (2016)      8810
## 15 ktvc_2016 Traverse City, MI (2016)  8814

The following global summaries will be useful:

  • plotCountsByMetric() - overall percentage by metric, applied to total counts for locale
  • plotNumCor() - overall geom_smooth()
  • plotFactorNumeric() - overall mean/median of numeric by factor
  • corMETAR() and lmMETAR() - not applicable, though could be run on full dataset
  • basicWindPlots() - tbd, perhaps adapt along with consolidatePlotWind
  • consolidatePlotWind() - tbd, perhaps adapt along with basicWindPlots

Helper functions can be created for:

  • helperCountsByMetric() - get the overall percentage by metric for variable x
  • helperNumCor() - get an overall geom_smooth for variable y vs. variable x
  • helperFactorNumeric() - get the overall mean or median for numeric variable y by factor variable x

The function helperFactorNumeric is created to apply function f to numeric variable y by factor variable x:

# Helper function to get overall percentage by metric for variable x
helperCountsByMetric <- function(tbl, ctVar, sumOn="dummyVar") {

    tbl %>%
        mutate(dummyVar=1) %>%
        select_at(vars(all_of(c(ctVar, sumOn)))) %>%
        filter_all(all_vars(!is.na(.))) %>%
        group_by_at(ctVar) %>%
        summarize(n=sum(get(sumOn))) %>%
        mutate(nPct=n/sum(n))
        
}

# Example run to get counts by greatest sky obscuration
helperCountsByMetric(all2016Data, ctVar="wType")
## # A tibble: 7 x 3
##   wType     n    nPct
##   <fct> <dbl>   <dbl>
## 1 VV      761 0.00576
## 2 OVC   39489 0.299  
## 3 BKN   31355 0.237  
## 4 SCT   16213 0.123  
## 5 FEW   19305 0.146  
## 6 CLR   24438 0.185  
## 7 Error   582 0.00440
# Helper function to get a geom_smooth for variable y vs variable x
helperNumCor <- function(tbl, 
                         xVar, 
                         yVar, 
                         se=TRUE, 
                         color="red", 
                         method="lm", 
                         lty=2
                         ) {
    
    plotData <- tbl %>%
        group_by_at(vars(all_of(c(xVar, yVar)))) %>%
        summarize(nTotal=n()) %>%
        filter_all(all_vars(!is.na(.)))
    
    geom_smooth(data=plotData, 
                aes_string(x=xVar, y=yVar, weight="nTotal"), 
                se=se, 
                color=color, 
                method=method, 
                lty=lty
                )
    
}

# Example run to get TempC vs DewC
helperNumCor(all2016Data, xVar="TempC", yVar="DewC")
## mapping: weight = ~nTotal, x = ~TempC, y = ~DewC 
## geom_smooth: na.rm = FALSE, se = TRUE
## stat_smooth: na.rm = FALSE, se = TRUE, method = lm, formula = y ~ x
## position_identity
# Example for using the helper function on a plot
plotNumCor(kdtw_2016, var1="TempC", var2="DewC") + 
    helperNumCor(all2016Data, xVar="TempC", yVar="DewC")

# Helper function to calculate .f(numVar) by byVar
helperFactorNumeric <- function(tbl, .f, byVar, numVar) {
    
    tbl %>%
        select_at(vars(all_of(c(byVar, numVar)))) %>%
        filter_all(all_vars(!is.na(.))) %>%
        group_by_at(byVar) %>%
        summarize(helpFN=.f(get(numVar)))
    
}

# Example for getting median TempF by month
helperFactorNumeric(all2016Data, .f=median, byVar="month", numVar="TempF")
## # A tibble: 12 x 2
##    month helpFN
##    <fct>  <dbl>
##  1 Jan     30.9
##  2 Feb     35.1
##  3 Mar     46.9
##  4 Apr     52.0
##  5 May     64.0
##  6 Jun     73.0
##  7 Jul     77  
##  8 Aug     75.9
##  9 Sep     71.1
## 10 Oct     61.0
## 11 Nov     50  
## 12 Dec     34.0

The function plotCountsByMetric() has been updated above to allow for facetting and plotting of the overall central tendency. Two examples are shown - the base from previous, and a facetted example:

# Previous Example for Detroit 2016 - using WindDir, cType1, month, wType
plotcountsByMetric(kdtw_2016, 
                   mets=c("WindDir", "cType1", "month", "wType"), 
                   title="Detroit, MI (2016)", 
                   dropNA=TRUE, 
                   diagnose=TRUE
                   )
## 
## Dropping 1 rows with 49 observations due to NA

# Facetted example for kdtw_2016, kord_2016, klas_2016, ksan_2016
useData <- all2016Data %>%
    filter(source %in% c("kdtw_2016", "klas_2016", "kord_2016", "ksan_2016"))

plotcountsByMetric(useData, 
                   mets=c("WindDir", "cType1", "month", "wType"), 
                   title="Comparison Across Locales (red dots are the median)", 
                   dropNA=TRUE, 
                   diagnose=TRUE, 
                   facetOn="sourceName", 
                   showCentral=TRUE
                   )
## 
## Dropping 4 rows with 117 observations due to NA

As observed previously, Las Vegas tends towards southerly winds while San Diego tends towards northwesterly winds and calm (direction 000) winds. Detroit is most likely to be overcast, while Las Vegas is most likely to be clear.

Next steps are to adapt the plotNumCor() and plotFactorNumeric() functions similarly.